# 16kHz audio processing

Focalcodec 25hz
Apache-2.0
Low-bitrate speech codec based on focal modulation network, supporting 16 kHz speech encoding
Speech Synthesis
F
lucadellalib
25
1
Audio Emotion Detection
Apache-2.0
This model is fine-tuned from facebook/wav2vec2-large-xlsr-53 for audio emotion detection, capable of recognizing 7 emotional states
Audio Classification Transformers
A
Hatman
630
8
Sentis Whisper Tiny
Apache-2.0
Whisper-Tiny is a small automatic speech recognition (ASR) model developed by OpenAI, designed for speech-to-text tasks and suitable for Unity environments.
Speech Recognition
S
unity
253
48
Wav2vec2 French Phonemizer
MIT
This is a model fine-tuned for the task of French speech to phoneme conversion, based on the facebook/wav2vec2-base-fr-voxpopuli-v2 model and trained using the Common Voice v13 dataset.
Speech Recognition Transformers French
W
Cnam-LMSSC
9,832
7
Wav2vec2 Large Vi Vlsp2020
Vietnamese automatic speech recognition model based on wav2vec2 architecture, pre-trained with 13,000 hours of unlabeled YouTube audio and fine-tuned on 250 hours of labeled data
Speech Recognition Transformers Other
W
nguyenvulebinh
385
4
Wav2vec2 Conformer Rope Large 100h Ft
Apache-2.0
Wav2Vec2 Conformer model fine-tuned on 100 hours of Librispeech data, incorporating rotary position embedding technology
Speech Recognition Transformers English
W
facebook
99
0
Wav2vec2 Large 10min Lv60 Self
Apache-2.0
This model is a large-scale speech recognition model based on the Wav2Vec2 architecture, pre-trained and fine-tuned on 10 minutes of data from Libri-Light and Librispeech, using self-training objectives, suitable for 16kHz sampled speech audio.
Speech Recognition Transformers English
W
Splend1dchan
177
0
Data2vec Audio Large 10m
Apache-2.0
Data2Vec is a general self-supervised learning framework applicable to speech, vision, and language tasks. This large audio model is pre-trained and fine-tuned on 10 minutes of Librispeech data, suitable for 16kHz sampled speech audio.
Speech Recognition Transformers English
D
facebook
19
0
Data2vec Audio Large
Apache-2.0
Data2Vec-Audio-Large is a large model pre-trained on 16kHz sampled speech audio using a self-supervised learning framework, suitable for tasks such as speech recognition.
Speech Recognition Transformers English
D
facebook
97
1
Wav2vec2 Large Xlsr 53 Ukrainian
Apache-2.0
A Ukrainian automatic speech recognition (ASR) model fine-tuned from facebook/wav2vec2-large-xlsr-53, trained on the Common Voice dataset.
Speech Recognition Other
W
anton-l
21
1
Data2vec Audio Base 100h
Apache-2.0
Data2Vec is a general self-supervised learning framework applicable to speech, vision, and language tasks. This audio base model was pre-trained and fine-tuned on 100 hours of Librispeech audio data.
Speech Recognition Transformers English
D
facebook
4,369
1
Hubert Xlarge Ll60k
Apache-2.0
Hubert is a self-supervised learning-based speech representation model that learns joint acoustic and linguistic representations of speech through BERT-like predictive loss.
Speech Recognition Transformers English
H
facebook
3,874
5
Wav2vec2 Large Xlsr Turkish
Apache-2.0
This is an automatic speech recognition model fine-tuned on the Turkish Common Voice dataset based on the facebook/wav2vec2-large-xlsr-53 model, achieving a test WER of 21.13%.
Speech Recognition Other
W
cahya
61
2
Wav2vec2 Base Sl Voxpopuli V2
This is a speech model based on Facebook's Wav2Vec2 architecture, specifically pretrained for Slovenian (sl) using 11.3k hours of unlabeled data from the VoxPopuli corpus.
Speech Recognition Transformers Other
W
facebook
31
0
Wav2vec2 Base Pt Voxpopuli V2
Wav2Vec2 base model pretrained on Portuguese VoxPopuli corpus, suitable for speech recognition tasks
Speech Recognition Transformers Other
W
facebook
30
0
Wav2vec2 Large Slavic Voxpopuli V2
Facebook's Wav2Vec2 large model, pre-trained on 88.99999999999999 hours of unlabeled data from the Slavic language VoxPopuli corpus.
Speech Recognition Transformers
W
facebook
26
0
Wav2vec2 Large Baltic Voxpopuli V2
Facebook's Wav2Vec2 large model, pre-trained on 27.5 hours of unlabeled data from the Baltic language subset of the VoxPopuli corpus.
Speech Recognition Transformers
W
facebook
25
0
Wav2vec2 Base Pl Voxpopuli V2
Polish Wav2Vec2 base model trained on VoxPopuli corpus, suitable for speech recognition tasks
Speech Recognition Transformers Other
W
facebook
30
0
Greek Lsr 1
Apache-2.0
An automatic speech recognition model fine-tuned on Greek language based on facebook/wav2vec2-large-xlsr-53
Speech Recognition Transformers Other
G
skylord
17
0
Wav2vec2 Large Xlsr 53 Rm Vallader
Apache-2.0
A fine-tuned speech recognition model for the Romansh Vallader dialect based on facebook/wav2vec2-large-xlsr-53, achieving a word error rate of 32.89%
Speech Recognition
W
anuragshas
58
0
Wav2vec2 Large Romance Voxpopuli V2
Facebook's Wav2Vec2 large model, pretrained only on 101.5 hours of unlabeled data from the Romance language VoxPopuli corpus, suitable for speech recognition tasks.
Speech Recognition Transformers
W
facebook
26
0
Wav2vec2 Large Mt Voxpopuli V2
Facebook's Wav2Vec2 large model, pretrained exclusively on unlabeled data from the VoxPopuli corpus for Maltese (mt), suitable for speech recognition tasks.
Speech Recognition Transformers Other
W
facebook
25
0
Xlsr Indonesia
Apache-2.0
Indonesian automatic speech recognition (ASR) model fine-tuned on the XLSR architecture, trained on the Common Voice Indonesian dataset
Speech Recognition Transformers Other
X
acul3
23
0
Hubert Base Ls960
Apache-2.0
HuBERT is a self-supervised speech representation learning model that learns speech features through BERT-like prediction loss, suitable for tasks such as speech recognition.
Speech Recognition Transformers English
H
facebook
406.60k
55
Wav2vec2 Base Sk Voxpopuli V2
Wav2Vec2 base model pretrained on Slovak data from the VoxPopuli corpus, suitable for speech recognition tasks.
Speech Recognition Transformers Other
W
facebook
31
0
Wav2vec2 Base Sv Voxpopuli V2
A speech model based on Facebook's Wav2Vec2 architecture, specifically pre-trained for Swedish using 16.3k hours of unlabeled data from the VoxPopuli corpus.
Speech Recognition Transformers Other
W
facebook
30
0
Wav2vec2 Base Cs Voxpopuli V2
Wav2Vec2 base model pretrained on the VoxPopuli corpus, specialized for Czech speech processing
Speech Recognition Transformers Other
W
facebook
33
1
Wav2vec2 Base Fi Voxpopuli V2
A speech model based on Facebook's Wav2Vec2 architecture, specifically pre-trained for Finnish, suitable for speech recognition tasks.
Speech Recognition Transformers Other
W
facebook
29
1
Wav2vec2 Large Xlsr 53 French
Apache-2.0
This is a French speech recognition model fine-tuned from the XLSR-53 large model, trained on the Common Voice dataset, supporting high-accuracy French speech-to-text conversion.
Speech Recognition French
W
jonatasgrosman
47.83k
11
Wav2vec2 Large Xlsr Persian
Apache-2.0
A fine-tuned automatic speech recognition model for Persian (Farsi) based on facebook/wav2vec2-large-xlsr-53, supporting 16kHz sampled audio input.
Speech Recognition Other
W
m3hrdadfi
562
16
Wav2vec2 Large Xlsr Georgian
Apache-2.0
This is an automatic speech recognition (ASR) model fine-tuned on the Georgian language based on the facebook/wav2vec2-large-xlsr-53 model, trained using the Common Voice dataset.
Speech Recognition Other
W
m3hrdadfi
66
5
Wav2vec2 Base Et Voxpopuli V2
A speech model based on Facebook's Wav2Vec2 framework, specifically pretrained for Estonian
Speech Recognition Transformers Other
W
facebook
30
0
Wav2vec2 Large Xlsr Pt
Apache-2.0
A Portuguese automatic speech recognition (ASR) model fine-tuned from facebook/wav2vec2-large-xlsr-53, achieving a 17.22% word error rate (WER) on the Common Voice Portuguese dataset
Speech Recognition Other
W
gchhablani
29
0
Wav2vec2 Base En Voxpopuli V2
A Wav2Vec2 base model pre-trained on 24.1k hours of unlabeled English data from the VoxPopuli corpus, suitable for speech recognition tasks.
Speech Recognition Transformers English
W
facebook
35
1
Wav2vec2 Large Xlsr 53 Tatar
Apache-2.0
A speech recognition model fine-tuned on the Tatar Common Voice dataset based on Facebook's wav2vec2-large-xlsr-53 model
Speech Recognition Other
W
anton-l
25
1
Wav2vec2 Base De Voxpopuli V2
A German speech pretrained model based on Facebook's Wav2Vec2 architecture, pretrained using 23.2k unlabeled German data from the VoxPopuli corpus.
Speech Recognition Transformers German
W
facebook
44
1
Wav2vec2 Base Nl Voxpopuli V2
A speech model based on Facebook's Wav2Vec2 architecture, specifically pretrained for Dutch using 19.0k unlabeled data from the VoxPopuli corpus.
Speech Recognition Transformers Other
W
facebook
22
0
Romanian Wav2vec2
Apache-2.0
A Romanian speech recognition model fine-tuned based on facebook/wav2vec2-xls-r-300m, trained on Common Voice 8.0 and Romanian speech synthesis datasets, ranked first in Romanian speech recognition in the HuggingFace Robust Speech Challenge.
Speech Recognition Transformers Other
R
gigant
88.90k
6
Wav2vec2 Large Uralic Voxpopuli V2
Wav2Vec2 large speech model pre-trained on 42.5 hours of unannotated Uralic language data from the VoxPopuli corpus
Speech Recognition Transformers
W
facebook
46
0
Wav2vec2 Base Lt Voxpopuli V2
This is a speech model based on Facebook's Wav2Vec2 architecture, specifically pretrained for Lithuanian using 14.4k unlabeled data from the VoxPopuli corpus.
Speech Recognition Transformers Other
W
facebook
31
0
Featured Recommended AI Models
AIbase
Empowering the Future, Your AI Solution Knowledge Base
© 2025AIbase